computational complexity
Bounds on the computational complexity of neurons due to dendritic morphology
The simple linear threshold units used in many artificial neural networks have a limited computational capacity. Famously, a single unit cannot handle nonlinearly separable problems like XOR. In contrast, real neurons exhibit complex morphologies as well as active dendritic integration, suggesting that their computational capacities outperform those of simple linear units. Considering specific families of Boolean functions, we empirically examine the computational limits of single units that incorporate more complex dendritic structures. For random Boolean functions, we show that there is a phase transition in learnability as a function of the input dimension, with most random functions below a certain critical dimension being learnable and those above not.
Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations
SO(3)-equivariant networks are the dominant models for machine learning interatomic potentials (MLIPs). The key operation of such networks is the ClebschGordan (CG) tensor product, which is computationally expensive. To accelerate the computation, we develop tensor decomposition networks (TDNs) as a class of approximately equivariant networks in which CG tensor products are replaced by low-rank tensor decompositions, such as the CANDECOMP/PARAFAC (CP) decomposition. With the CP decomposition, we prove (i) a uniform bound on the induced error of SO(3)-equivariance, and (ii) the universality of approximating any equivariant bilinear map. To further reduce the number of parameters, we propose path-weight sharing that ties all multiplicity-space weights across the O(L3)CG paths into a single shared parameter set without compromising equivariance, where L is the maximum angular degree.
Continuous-time Riemannian SGD and SVRGFlows on Wasserstein Probabilistic Space
Recently, optimization on the Riemannian manifold have provided valuable insights to the optimization community. In this regard, extending these methods to to the Wasserstein space is of particular interest, since optimization on Wasserstein space is closely connected to practical sampling processes. Generally, the standard (continuous) optimization method on Wasserstein space is Riemannian gradient flow (i.e., Langevin dynamics when minimizing KL divergence). In this paper, we aim to enrich the family of continuous optimization methods in the Wasserstein space, by extending the gradient flow on it into the stochastic gradient descent (SGD) flow and stochastic variance reduction gradient (SVRG) flow. By leveraging the property of Wasserstein space, we construct stochastic differential equations (SDEs) to approximate the corresponding discrete Euclidean dynamics of the desired Riemannian stochastic methods. Then, we obtain the flows in Wasserstein space by Fokker-Planck equation. Finally, we establish convergence rates of the proposed stochastic flows, which align with those known in the Euclidean setting.
Nyström-Accelerated Primal LS-SVMs: Breaking the O(an3) Complexity Bottleneck for Scalable ODEs Learning
A major problem of kernel-based methods (e.g., least squares support vector machines, LS-SVMs) for solving linear/nonlinear ordinary differential equations (ODEs) is the prohibitive O(an3) (a = 1 for linear ODEs and 27 for nonlinear ODEs) part of their computational complexity with increasing temporal discretization points n. We propose a novel Nyström-accelerated LS-SVMs framework that breaks this bottleneck by reformulating ODEs as primal-space constraints. Specifically, we derive for the first time an explicit Nyström-based mapping and its derivatives from one-dimensional temporal discretization points to a higher m-dimensional feature space (1 < m n), enabling the learning process to solve linear/nonlinear equation systems with m-dependent complexity. Numerical experiments on sixteen benchmark ODEs demonstrate: 1) 10 6000 times faster computation than classical LS-SVMs and physics-informed neural networks (PINNs), 2) comparable accuracy to LS-SVMs (< 0.13% relative MAE, RMSE, and y ˆy difference) while maximum surpassing PINNs by 72% in RMSE, and 3) scalability to n = 104 time steps with m = 50features. This work establishes a new paradigm for efficient kernel-based ODEs learning without significantly sacrificing the accuracy of the solution.
CodeGEMM: ACodebook-Centric Approach to Efficient GEMM in Quantized LLMs
Weight-only quantization is widely used to mitigate the memory-bound nature of LLM inference. Codebook-based methods extend this trend by achieving strong accuracy in the extremely low-bit regime (e.g., 2-bit). However, current kernels rely on dequantization, which repeatedly fetches centroids and reconstructs weights, incurring substantial latency and cache pressure. We present CodeGEMM, a codebook-centric GEMM kernel that replaces dequantization with precomputed inner products between centroids and activations stored in a lightweight Psumbook. At inference, code indices directly gather these partial sums, eliminating per-element lookups and reducing the on-chip footprint. The kernel supports the systematic exploration of latency-memory-accuracy trade-offs under a unified implementation. On Llama-3 models, CodeGEMM delivers 1.83 (8B) and 8.93 (70B) speedups in the 2-bit configuration compared to state-of-the-art codebookbased quantization at comparable accuracy and further improves computing efficiency and memory subsystem utilization.
An Adaptive Quantum Circuit of Dempster's Rule of Combination for Uncertain Pattern Classification
In pattern classification, efficient uncertainty reasoning plays a critical role, particularly in real-time applications involving noisy data, ambiguous class boundaries, or overlapping categories. Leveraging the advanced computational power of quantum computing, an Adaptive Quantum Circuit for Dempster's Rule of Combination (AQC-DRC) is proposed to address efficient classification under uncertain environments. The AQC-DRC is developed within the framework of quantum evidence theory (QET) and facilitates decision-making based on quantum basic probability and plausibility levels, which is a generalized Bayesian inference method. The AQC-DRC provides a deterministic computation of DRC, ensuring that quantum fusion outcomes in uncertain pattern classification are exactly aligned with those of the classical method, while simultaneously achieving exponential reductions in the computational complexity of evidence combination and significantly improving fusion efficiency. It is founded that the quantum basic probability amplitude function in QET, as a generalized quantum probability amplitude, can be naturally utilized to express the quantum amplitude encoding. In addition, the quantum basic probability in QET, as a generalized quantum probability, naturally forms a quantum basic probability distribution and can be used to represent quantum measurement outcomes for quantum basic probability level decision-making. Furthermore, the quantum plausibility function in QET also can be naturally used to express the quantum measurement outcomes for quantum plausibility level decision-making. These findings enrich the physical understanding of quantum amplitude encoding and quantum measurement outcomes, offering broad application prospects for representing and processing uncertain knowledge in pattern classification.
Counteractive RL: Rethinking Core Principles for Efficient and Scalable Deep Reinforcement Learning
Following the pivotal success of learning strategies to win at tasks, solely by interacting with an environment without any supervision, agents have gained the ability to make sequential decisions in complex MDPs. Yet, reinforcement learning policies face exponentially growing state spaces in high dimensional MDPs resulting in a dichotomy between computational complexity and policy success. In our paper we focus on the agent's interaction with the environment in a high-dimensional MDP during the learning phase and we introduce a theoretically-founded novel paradigm based on experiences obtained through counteractive actions. Our analysis and method provide a theoretical basis for efficient, effective, scalable and accelerated learning, and further comes with zero additional computational complexity while leading to significant acceleration in training. We conduct extensive experiments in the Arcade Learning Environment with high-dimensional state representation MDPs. The experimental results further verify our theoretical analysis, and our method achieves significant performance increase with substantial sample-efficiency in high-dimensional environments.
e150e6d0a1e5214740c39c6e4503ba7a-Supplemental-Conference.pdf
Appendix382 AAdditional Experiments3383 A.1 Experiments on the ETT datasets384 In the main body, we present a comparison of the benchmark methods on the ETTm2 dataset. In this385 section, we extend our analysis to the remaining three ETT datasets, namely ETTh1, ETTh2, and386 ETTm1, as summarized in Table 7. Our experimental results reveal that Basisformer outperforms all387 other methods in terms of MSE and MAE. In all experiments, lower MSE values indicate better model performance, and we present the best results in boldface. Experimental results with longer length input setting391 Throughout our research, we maintain consistency in our experimental settings by fixing the input392 length to be 96(with a reduced input length of 36for the illness dataset), instead of using a longer393 length.